Pruning closed itemset lattices for associations rules
نویسندگان
چکیده
Rsumm La ddcouverte des rgles d'association est l'un des principaux probllmes de l'extraction de connaissances dans les bases de donnnes. De nombreux algorithmes eecaces ont tt proposss, dont les plus remarquables sont Apriori, l'algorithme de Mannila, Partition, Sampling et DIC. Ces derniers sont tous basss sur la mmthode de recherche de Apriori: l''lagage du treillis des parties (treillis des itemsets). Dans cet article, nous proposons un algorithme eecace bass sur une nouvelle mmthode de recherche: l''lagage du treillis des fermms (treillis des itemsets fermms). Ce treillis qui est un sous-ordre du treillis des parties est troitement lii au treillis de concepts de Wille dans son analyse formelle de concepts. Nous avons compar exprimentalement Close une version optimisse de Apriori et les rsultats obtenus montrent la grande eecacitt de Close dans le traitement des donnnes denses et/ou corrlles telles que les donnnes de rescensement (cas diicile). Nous avons galement pu observer que Close donne des temps de rponse corrects dans le traitement des bases de donnnes de ventes. Abstract Discovering association rules is one of the most important task in data mining and many eecient algorithms have been proposed in the literature. The most noticeable are Apriori, Mannila's algorithm, Partition, Sampling and DIC, that are all based on the Apriori mining method: pruning of the subset lattice (itemset lattice). In this paper we propose an eecient algorithm, called Close, based on a new mining method: pruning of the closed set lattice (closed itemset lattice). This lattice, which is a sub-order of the subset lattice, is closely related to Wille's concept lattice in formal concept analysis. Experiments comparing Close to an optimized version of Apriori showed that Close is very eecient for mining dense and/or correlated data such as census data, and performs reasonably well for market basket style data.
منابع مشابه
Further Pruning for Efficient Association Rule Discovery
The Apriori algorithm’s frequent itemset approach has become the standard approach to discovering association rules. However, the computation requirements of the frequent itemset approach are infeasible for dense data and the approach is unable to discover infrequent associations. OPUS AR is an efficient algorithm for association rule discovery that does not utilize frequent itemsets and hence ...
متن کاملTraversing Itemset Lattices with Statistical Metric Pruning
ABSTRACT We study how to e ciently compute signi cant association rules according to common statistical measures such as a chi-squared value or correlation coe cient. For this purpose, one might consider to use of the Apriori algorithm, but the algorithm needs major conversion, because none of these statistical metrics are anti-monotone, and the use of higher support for reducing the search spa...
متن کاملA lattice-based approach for mining most generalization association rules
Traditional association rules consist of some redundant information. Some variants based on support and confidence measures such as non-redundant rules and minimal non-redundant rules were thus proposed to reduce the redundant information. In the past, we proposed most generalization association rules (MGARs), which were more compact than (minimal) non-redundant rules in that they considered th...
متن کاملAccelerating Closed Frequent Itemset Mining by Elimination of Null Transactions
The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...
متن کاملMining Non- Redundant Frequent Pattern in Taxonomy Datasets using Concept Lattices
In general frequent itemsets are generated from large data sets by applying various association rule mining algorithms, these produce many redundant frequent itemsets. In this paper we proposed a new framework for Non-redundant frequent itemset generation using closed frequent itemsets without lose of information on Taxonomy Datasets using concept lattices. General Terms Frequent Pattern, Assoc...
متن کامل